Coffee Break
QUIZ
Next Week
Centrality: Mean, Median, Mode
Centrality: Mean, Median, Mode
0 . 0 . 0 . 0 . 0
o . o . o . o . o . o
Centrality: Mean, Median, Mode
Sample : 2 , 4.4 , 3 , 3 , 2 , 2.2 , 2 , 4
2 , 2 , 2 , 2.2 , 3 , 3 , 4, 4.4 ( n=8 , n/2 = 4)
Centrality: Mean, Median, Mode
xdata <- c(2,4.4,3,3,2,2.2,2,4)
Quantiles, Percentiles, and the Five-Number Summary
The median = the 0.5th quantile = The 50th percentile
Sample : 2 , 4.4 , 3 , 3 , 2 , 2.2 , 2 , 4
2 , 2 , 2 , 2.2 , 3 , 3 , 4, 4.4
0.5th quantile = median = 2.6
Quantiles, Percentiles, and the Five-Number Summary
xdata <- c(2,4.4,3,3,2,2.2,2,4)
quantile(xdata,prob=0.8) # the 0.8th quan- tile (or 80th percentile)## 80%
## 3.6
## 0% 25% 50% 75% 100%
## 2.00 2.00 2.60 3.25 4.40
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 2.000 2.600 2.825 3.250 4.400
Quantiles, Percentiles, and the Five-Number Summary
A quartile is a type of quantile.
Quantiles , Percentiles, and the Five-Number Summary
xdata <- c(2,4.4,3,3,2,2.2,2,4)
Spread: Variance, Standard Deviation, and the Interquartile Range
## [1] 2.825
## [1] 2.825
plot(xdata,type="n",xlab="",ylab="data vector",yaxt="n",bty="n")
abline(h=c(3,3.5),lty=2,col="red")
abline(v=2.825,lwd=2,lty=3)
text(c(0.8,0.8),c(3,3.5),labels=c("x","y"))
points(jitter(c(xdata,ydata)),c(rep(3,length(xdata)), rep(3.5,length(ydata))))the observations in ydata are more “spread out”
Spread: Variance, Standard Deviation, and the Interquartile Range
Spread: Variance, Standard Deviation, and the Interquartile Range
2 , 4.4 , 3 , 3 , 2 , 2.2 , 2 , 4 ( mean = 2.825)
Spread: Variance, Standard Deviation, and the Interquartile Range
0.953 represents the average distance of each observation from the mean
Spread: Variance, Standard Deviation, and the Interquartile Range,
Spread: Variance, Standard Deviation, and the Interquartile Range
## [1] 0.9078571
## [1] 0.9528154
## [1] 1.25
Spread: Variance, Standard Deviation, and the Interquartile Range
xdata <- c(2,4.4,3,3,2,2.2,2,4)
Covariance and Correlation
Covariance and Correlation
x = {x1,x2,…,xn}
y = {y1,y2,…,yn}
for i = 1,. . . ,n
When you get a positive result for rxy, it shows that there is a positive lin- ear relationship. When rxy = 0, this indicates that there is no linear relationship.
Covariance and Correlation
x = {2,4.4,3,3,2,2.2,2,4}
y = {1,4.4,1,3,2,2.2,2,7}
mean x and y = 2.825
positive relationship
Covariance and Correlation
Covariance and Correlation
Covariance and Correlation
Most common of these is Pearson’s product-moment correlation coefficient. (R default)
The correlation coefficient estimates the nature of the linear relationship between two sets of observations
−1 ≤ ρxy ≤ 1
ρxy = 1, which is a perfect positive linear relationship
Covariance and Correlation
x = {2,4.4,3,3,2,2.2,2,4}
y = {1,4.4,1,3,2,2.2,2,7}
(mean x and y = 2.825)
(sx = 0.953 and sy = 2.013)
(rxy = 1.479)
ρxy is positive
Covariance and Correlation
x <- c(2,4.4,3,3,2,2.2,2,4)
y <- c(1,4.4,1,3,2,2.2,2,7)
plot(x,y, col="red", pch=13,cex=1.5)
abline(lm(y ~ x))Covariance and Correlation
## [1] 1.479286
## [1] 0.7713962
Covariance and Correlation
Barplots and Pie Charts
station_data <- read.csv("https://web.itu.edu.tr/~tokerem/18397_Cekmekoy_Omerli_15dk.txt", header=T, sep = ";")
head(station_data)## sta_no year month day hour minutes temp precipitation pressure
## 1 18397 2017 7 26 18 0 23.9 0 1003.0
## 2 18397 2017 7 26 18 15 23.9 0 1003.1
## 3 18397 2017 7 26 18 30 23.8 0 1003.2
## 4 18397 2017 7 26 18 45 23.8 0 1003.2
## 5 18397 2017 7 26 19 0 23.6 0 1003.2
## 6 18397 2017 7 26 19 15 23.2 0 1003.1
## relative_humidity
## 1 94
## 2 95
## 3 96
## 4 96
## 5 96
## 6 97
Barplots and Pie Charts
## [1] 23.9 23.9 23.8 23.8 23.6 23.2
Barplots and Pie Charts
## [1] 23.9 23.9 23.8 23.8 23.6 23.2 23.2 23.1 23.0 22.8 22.5 22.4 22.2 22.3
## [15] 22.2 21.7 21.9 21.7 21.6 22.2 22.2 22.1 22.3 22.5 22.3 22.2 22.5 22.6
## [29] 22.6 22.6 22.6 22.7 22.6 22.5 22.6 22.5 22.5 22.4 22.5 22.4 22.5 22.6
## [43] 23.0 23.2 24.2 25.1 25.5 26.1 27.1 26.9 27.6 28.0 28.4 28.5 29.3 30.2
## [57] 30.1 30.1 30.4 30.4 30.8 30.9 31.0 31.5 31.2 30.9 30.9 30.4 30.4 30.0
## [71] 29.2 29.5 29.4 29.3 29.6 28.8 29.0 29.0 29.2 28.4 27.8 27.4 26.6 26.2
## [85] 25.8 25.6 25.4 24.2 19.2 19.5 20.1 20.8 21.2 21.4 21.4 21.4 21.2 21.0
## [99] 20.8 20.9 20.8 20.7 20.8 20.8 20.9 20.6 20.6 20.5 20.7 20.8 20.4 20.4
## [113] 20.6 20.5 20.4 20.5 20.5 20.6 20.5 20.5 20.4
Barplots and Pie Charts
##
## 19.2 19.5 20.1 20.4 20.5 20.6 20.7 20.8 20.9 21 21.2 21.4 21.6 21.7 21.9
## 1 1 1 4 6 4 2 6 2 1 2 3 1 2 1
## 22.1 22.2 22.3 22.4 22.5 22.6 22.7 22.8 23 23.1 23.2 23.6 23.8 23.9 24.2
## 1 5 3 3 8 7 1 1 2 1 3 1 2 2 2
## 25.1 25.4 25.5 25.6 25.8 26.1 26.2 26.6 26.9 27.1 27.4 27.6 27.8 28 28.4
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
## 28.5 28.8 29 29.2 29.3 29.4 29.5 29.6 30 30.1 30.2 30.4 30.8 30.9 31
## 1 1 2 2 2 1 1 1 1 2 1 4 1 3 1
## 31.2 31.5
## 1 1
Barplots and Pie Charts
Barplots and Pie Charts
barplot(f_temp,beside=TRUE,horiz=TRUE,las=1,
main="Frequency of Station Temperature",
names.arg=c("T"),legend.text=c("TEMP-f"),
args.legend=list(x="bottomright"))Barplots and Pie Charts
## Warning: package 'ggplot2' was built under R version 3.5.2
Barplots and Pie Charts
head(station_data$precipitation)
pie(table(station_data$precipitation),labels=c("V1","V2","V3","V4","V5"),col=c("white","blue","green","orange"),main="pie chart for precipitation")
Histogram
Histogram
hist(station_data$tem,breaks=seq(19,32,1),col="green",main="Temp",xlab="HP")
abline(v=c(mean(station_data$temp),median(station_data$temp)), col=c("blue","red"),lty=c(2,3),lwd=2)
legend("topright",legend=c("mean T","median T"),lty=c(2,3),lwd=2,col=c("blue","red"))Histogram
qplot(station_data$temp,geom="blank",main="Temp Hist",xlab="Temp")+
geom_histogram(color="black",fill="white",breaks=seq(19,32,1),closed="right") +
geom_vline(mapping=aes(xintercept=c(mean(station_data$tem), median(station_data$tem)), linetype=factor(c("mean","median"))) , col=c("blue","red"),show.legend=TRUE)+
scale_linetype_manual(values=c(2,3)) +
labs(linetype="")Boxplot
## [1] 24.24132
## [1] 22.6
## 0% 25% 50% 75% 100%
## 19.2 21.4 22.6 27.6 31.5
Histogram and Boxplot
Boxplot
Scatter Plots
Scatter Plots
## temp precipitation pressure relative_humidity
## 1 23.9 0 1003.0 94
## 2 23.9 0 1003.1 95
## 3 23.8 0 1003.2 96
## 4 23.8 0 1003.2 96
## 5 23.6 0 1003.2 96
## 6 23.2 0 1003.1 97
A probability is a number that describes the “magnitude of chance” associated with making a particular observation or statement.
It’s always a number between 0 and 1 (inclusive) and is often expressed as a fraction.
X.outcomes <- c(2:12)
X.prob <- c((1/36),(2/36),(3/36),(4/36),(5/36),(6/36),(5/36),(4/36),(3/36),(2/36),(1/36))
barplot(X.prob,ylim=c(0,0.20),names.arg=X.outcomes,space=0,xlab="x",ylab="Pr(X = x)", main = "probability distribution")X.outcomes <- c(2:12)
X.prob <- c((1/36),(2/36),(3/36),(4/36),(5/36),(6/36),(5/36),(4/36),(3/36),(2/36),(1/36))
X.cumul <- cumsum(X.prob)
barplot(X.cumul,names.arg=X.outcomes,space=0,xlab="x",ylab="Pr(X <= x)", main = "cumulative probability distribution")X.outcomes <- c(2:12)
X.prob <- c((1/36),(2/36),(3/36),(4/36),(5/36),(6/36),(5/36),(4/36),(3/36),(2/36),(1/36))
barplot(X.prob,ylim=c(0,0.20),names.arg=X.outcomes,space=0,xlab="x",ylab="Pr(X = x)", main = "probability distribution")
abline(v=c(0.5:10.5))lower < 7 < upper
X >= 2 & X <= 7
(X[lower] - 1)/36
X > 7 & X <= 12
13 - X[upper])/36
X.outcomes <- c(1,2,3,4,5,6,7,8,9,10,11,12,13)
lower <- X.outcomes >= 2 & X.outcomes <= 7
upper <- X.outcomes > 7 & X.outcomes <= 12
fx <- rep(0,length(X.outcomes))
fx[lower] <- (X.outcomes[lower] - 1)/36
fx[upper] <- (13 - X.outcomes[upper])/36
plot(X.outcomes,fx,type="l",ylab="f(x)", xlim = c(0,14), main = "probability density function")
abline(h=0,col="gray",lty=2)fx.specific <- (4.5-1)/36
fx.specific.area <- 3.5*fx.specific*0.5
fx.specific.vertices <- rbind(c(1,0),c(4.5,0),c(4.5,fx.specific))
plot(X.outcomes,fx,type="l",ylab="f(x)", xlim = c(0,14), main = "probability density function")
abline(h=0,col="gray",lty=2)
polygon(fx.specific.vertices,col="gray",border=NA)
abline(v=4.5,lty=3)
text(4,0.01,labels=fx.specific.area)Symmetry : Draw a vertical line down the center, and it is equally reflected with 0.5 probability.
Skew : If a distribution is asymmetric, look at the “tail” of a distribution. Positive or right skew indicates a tail extending longer to the right of center.
Modality : Modality describes the number of easily identifiable peaks in the distribution of interest. Unimodal, bimodal, and trimodal…
##
## 19.2 19.5 20.1 20.4 20.5 20.6 20.7 20.8 20.9 21 21.2 21.4 21.6 21.7 21.9
## 1 1 1 4 6 4 2 6 2 1 2 3 1 2 1
## 22.1 22.2 22.3 22.4 22.5 22.6 22.7 22.8 23 23.1 23.2 23.6 23.8 23.9 24.2
## 1 5 3 3 8 7 1 1 2 1 3 1 2 2 2
## 25.1 25.4 25.5 25.6 25.8 26.1 26.2 26.6 26.9 27.1 27.4 27.6 27.8 28 28.4
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
## 28.5 28.8 29 29.2 29.3 29.4 29.5 29.6 30 30.1 30.2 30.4 30.8 30.9 31
## 1 1 2 2 2 1 1 1 1 2 1 4 1 3 1
## 31.2 31.5
## 1 1
## Var1 Freq
## 1 19.2 1
## 2 19.5 1
## 3 20.1 1
## 4 20.4 4
## 5 20.5 6
## 6 20.6 4
## 7 20.7 2
## 8 20.8 6
## 9 20.9 2
## 10 21 1
## 11 21.2 2
## 12 21.4 3
## 13 21.6 1
## 14 21.7 2
## 15 21.9 1
## 16 22.1 1
## 17 22.2 5
## 18 22.3 3
## 19 22.4 3
## 20 22.5 8
## 21 22.6 7
## 22 22.7 1
## 23 22.8 1
## 24 23 2
## 25 23.1 1
## 26 23.2 3
## 27 23.6 1
## 28 23.8 2
## 29 23.9 2
## 30 24.2 2
## 31 25.1 1
## 32 25.4 1
## 33 25.5 1
## 34 25.6 1
## 35 25.8 1
## 36 26.1 1
## 37 26.2 1
## 38 26.6 1
## 39 26.9 1
## 40 27.1 1
## 41 27.4 1
## 42 27.6 1
## 43 27.8 1
## 44 28 1
## 45 28.4 2
## 46 28.5 1
## 47 28.8 1
## 48 29 2
## 49 29.2 2
## 50 29.3 2
## 51 29.4 1
## 52 29.5 1
## 53 29.6 1
## 54 30 1
## 55 30.1 2
## 56 30.2 1
## 57 30.4 4
## 58 30.8 1
## 59 30.9 3
## 60 31 1
## 61 31.2 1
## 62 31.5 1
For discrete random variables
For discrete random variables
There are four functions associated with Binomial distributions.
It is a density or distribution function.
## [1] 0.03125
## [1] 0.2734375
## [1] 0.00390625 0.03125000 0.10937500 0.21875000 0.27343750 0.21875000
## [7] 0.10937500 0.03125000 0.00390625
X.outcomes <- c(1:13)
X.prob <- c((0/36),(1/36),(2/36),(3/36),(4/36),(5/36),(6/36),(5/36),(4/36),(3/36),(2/36),(1/36),(0/36))
barplot(X.prob,ylim=c(0,0.20),names.arg=X.outcomes,space=0,xlab="x",ylab="Pr(X = x)", main = "probability distribution")X.outcomes <- c(1:13)
X.prob <- c((0/36),(1/36),(2/36),(3/36),(4/36),(5/36),(6/36),(5/36),(4/36),(3/36),(2/36),(1/36),(0/36))
barplot(X.prob,ylim=c(0,0.20),names.arg=X.outcomes,space=0,xlab="x",ylab="Pr(X = x)", main = "probability distribution")
lines(dbinom(x = 0:12, size = 36, prob = 1/6), col= "red")λp should be interpreted as the “mean number of occurrences”
There are three functions associated with Binomial distributions.
plot(dpois(0:10,2.22),type = "o", col="red")
lines(dpois(0:10,4.22), type = "o", col = "blue")
lines(dpois(0:10,7.22), type = "o", col = "green")R - Common Probability Density Functions
Practice : Write A Function
Practice : Netcdf Packages
QUIZ